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GERMLINE MUTATIONS IN THE E-CADHERIN GENE AND METHOD 
FOR DETECTING PREDISPOSITION TO CANCER 

This invention relates to methods by which a predisposition to cancer can be 
5 determined. In particular, it relates to methods for detecting whether a patient has 
a predisposition to cancer, particularly hereditary diffuse gastric cancer. 

BACKGROUND 

10 The key to cancer treatment is early detection. The ability to predict who is at 
extreme risk, before the appearance of clinical symptoms, will enable the earliest 
possible detection of malignancy (watchful waiting). It will also enable prophylactic 
intervention prior to the onset of clinical signs. 

15 It is therefore the object of this invention to provide a predictive method by which 
susceptibility to cancer, particularly gastric cancer, can be determined or at least to 
provide the public with a useful choice. 

Gastric cancer remains a major cause of cancer death worldwide, and about 10% of 
20 cases show familial clustering. The relative contributions of inherited susceptibility 
and environmental effects to familial gastric cancer are poorly understood because 
little is known of the genetic events that predispose to gastric cancer. 

The identification of genes predisposing to familial cancer is therefore an essential 
25 step towards understanding the molecular events underlying tumourigene sis and is 
critical for the clinical management of affected families. 

The applicants have identified a gene in individuals which, when mutated, 
predisposes that individual towards developing cancer, particularly hereditary 
30 gastric cancer. It is this finding, and the implications it has for cancer screening 
and management (particularly for families with a history of familial cancer) which 
underlies the present invention. 
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SUMMARY OF THE INVENTION 

Accordingly, in a first aspect, the invention broadly provides a method of testing to 
detect whether an individual is predisposed to cancer which comprises the step of 
5 detecting the presence or absence of an alteration (mutation) in the gene encoding 
E-cadherin. 

In a further aspect, the invention provides a method of assessing the risk in a 
human subject for a predisposition for cancer which comprises the step of 
10 determining whether there is a germline alteration in the gene encoding E-cadherin, 
wherein the presence of an alteration is indicative of a risk for a predisposition for 
cancer. 

As used herein gene encoding E-cadherin" means not only the coding sequence for 
15 wild- type E-cadherin but also includes non-coding flanking sequences and 
regulatory elements, mutations in which can cause transcript instability and/ or 
transcriptional repression, and the sites for transcript splicing. These include the 
two nucleotides immediately upstream (usually "AG") and the two nucleotides 
immediately downstream (usually "GT") of each exon, and also the splicing branch 
20 site located 18-38 bp upstream of each exon. 

In one (preferred) embodiment, the presence or absence of the mutation is detected 
through analysis of the DNA encoding E-cadherin and/or its regulatory elements. 

25 In an alternative embodiment, the presence or absence of the mutation is detected 
through analysis of mRNA transcribed from the DNA encoding E-cadherin. 

In still a further embodiment, the presence or absence of the mutation is detected 
through analysis of the amino acid sequence of the expressed E-cadherin protein. 

30 

As a separate embodiment, the invention provides a method of prophylaxis and/ or 
therapeutic treatment against cancer of an individual identified as having a risk of 
predisposition to cancer by a method defined above which comprises the step of 
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increasing, maintaining and/ or restoring the active concentration of wild- type E- 
cadherin protein within said individual. 

Conveniently, the method will be a gene therapy method and will involve supplying 
5 the individual with wild-type E-cadherin gene function. 

DESCRIPTION OF THE DRAWINGS 

While the invention is broadly as defined above, it will be appreciated that it is not 
10 limited thereto but that it also includes embodiments of which the following 
description provides examples. In addition, the invention will be better understood 
through reference to the accompanying drawings in which: 



15 



Figure 1 shows the nucleotide and amino acid sequences for wild-type E- 
cadherin cDNA; 



20 



Figure 2 is a kindred map for one family (family A) having a predisposition 
to gastric cancer. Numbers to the right of the symbols indicate age at 
death. The age is underlined if a blood or biopsy sample was available. 
General symbols: squares, males; circles, females; all symbols with a 
diagonal, deceased. Solid symbols: gastric carcinoma, pathology 
available; dotted symbols: gastric carcinoma, pathology unavailable; 
vertical stripes: colorectal cancer; 



25 



Figure 3 is a graph showing the age of death from gastric cancer in the 
studied kindred of family A; 



Figure 4 shows the results of a mutation analysis of exon 7 of the E- 
cadherin gene as follows: 



30 



(a) 



SSCP pattern of exon 7 in E-cadherin gene. The SSCP band 
pattern of two affected people, two obligate carriers and two 
unaffected spouses (wild type) are shown. The additional band in 
the affected and obligate carrier samples is indicated by the arrow; 
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(b) Direct sequence analysis of the exon-intron boundary of exon 7 

showing the wild type sequence and the sequence from an affected 
person heterozygous for the G to T transversion. The position of 
5 the exon/intron boundary is marked; 

Figure 5 is an abbreviated kindred map for a second family (family B). 
General symbols: squares, males; circles, females; all symbols with a 
diagonal, deceased. Solid symbols: gastric carcinoma, pathology 
10 available; dotted symbols: gastric carcinoma, pathology unavailable; 

vertical stripes: colorectal cancer. Diagonal hatching: unconfirmed gastric 
carcinoma; 

Figure 6 shows sequence analysis results for DNA from family B (Figure 
15 6 A) and family C (Figure 6B), exons 15 and 13 respectively; 

Figure 7 shows pedigrees of non-Maori gastric cancer families. General 
symbols: squares, males; circles, females; all symbols with a diagonal, 
deceased. Patient numbers are included; and 

20 

Figure 8 shows mutations in gastric cancer families, (a). Exon 11 DNA 
sequence from family 1000 showing the insertion of an additional C 
nucleotide between the G at position 1588 and the A at position 1591. (b). 
Exon 2 sequence from family 4201 showing the hetero2ygous (G/T) 
25 mutation at position 70. (c). Exon 8 / intron 8 sequence of family CHG 72 

showing the heterozygous (G/A) mutation at the first nucleotide of the 
intron. Nucleotide positions are as described in Berx et ah (1995). 
Sequencing products were analysed on a LiCor 4000L DNA sequencer. 

30 DESCRIPTION OF THE INVENTION 

As defined above, the method of the invention detects a predisposition to cancer. 
The critical finding made by the applicants is that this predisposition is due to an 
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alteration (mutation) in the gene encoding E-cadherin. This finding forms the basis 
of the present invention. 

E-cadherin is a transmembrane protein with five tandemly repeated extracellular 
5 domains and a cytoplasmic domain which connects to the actin cytoskeleton via a 
complex with a, P and y catenins (Grunwald (1993)). It plays an important role in 
establishing cell polarity and maintaining normal tissue morphology and cellular 
differentiation. Diminished E-cadherin expression is associated with poorly 
differentiated carcinomas which display aggressive histopathologic characteristics 
10 such as infiltrative growth and lymph node involvement (Shiozaki et al (1995)). 
Under- expression has been proposed as a prognostic marker of poor clinical 
outcome in many tumour types (Bracke et al (1996)). In experimental tumour 
models, restored expression of E-cadherin can suppress the invasiveness of 
epithelial tumour cells (Frixen (1991), Vlemincke (1991)). 

15 

However, to date, there has been no suggestion that an alteration /mutation in the 
gene encoding E-cadherin is in any way predictive of susceptibility to cancer prior to 
tumourigenesis. 

20 The amino acid and cDNA nucleotide sequences encoding wild-type E-cadherin are 
shown in Figure 1 . Any change in either sequence is included in the scope of the 
term "mutation" as used herein. 

The gene encoding E-cadherin was identified as a susceptibility gene through 
25 genetic linkage analysis. This analysis was performed in relation to samples 
obtained from a large (Maori) kindred from New Zealand, the pedigree pattern of 
which is shown in Figure 2 (family A). This pedigree pattern is consistent with the 
dominant inheritance of a susceptibility gene with incomplete penetrance. 

30 The linkage analysis determined that the susceptibility to cancer was associated 
with the gene encoding E-cadherin. This was confirmed with reference firstly to two 
further Maori kindreds (families B and C) and then to non-Maori kindreds. 
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In one approach, according to the present invention, alteration of the wild-type E- 
cadherin gene is detected. In addition, the method can be performed by detecting 
the wild-type E-cadherin gene and confirming the lack of a predisposition or 
neoplasia. 

5 

"Alteration of a wild-type E-cadherin gene" encompasses all forms of mutations 
including deletions, insertions and point mutations in the coding and noncoding 
regions. Deletions may be of the entire gene or only a portion of the gene. Point 
mutations may result in stop codons, frame shift mutations or amino acid 
10 substitutions. 

The alterations or mutations which are focus of the predictive method of the 
invention are germ line mutations. Germ line mutations can be found in any of a 
body's tissues and are inherited. 

15 

Mutations leading to non-functional gene products primarily lead to a cancerous 
state. However, mutations which lead to decreased expression of the E-cadherin 
gene product will also lead to a cancerous condition. Point mutation events may 
occur in regulatory regions, such as in the promoter of the gene, leading to loss or 
20 diminution of expression of the mRNA. Point mutations may also abolish proper 
RNA processing, leading to loss of expression of the E-cadherin gene product, or a 
decrease in mRNA stability or translation efficiency. 

Predisposition to cancers, such as diffuse gastric cancer and the other cancers 
25 identified herein, can be ascertained by testing any tissue of a human for mutations 
of the E-cadherin gene. For example, a person who has inherited a germline E- 
cadherin mutation would be prone to develop cancers. This can be determined by 
testing DNA from any sample from the person's body such as serum, sputum and 
urine. Most simply, blood can be drawn and DNA extracted from the cells of the 
30 blood. In addition, prenatal diagnosis can be accomplished by testing fetal cells, 
placental cells or amniotic fluid for mutations of the E-cadherin gene. 

A preliminary analysis to detect deletions in DNA sequences can be performed by 
looking at a series of Southern blots of DNA cut with one or more restriction 
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enzymes, preferably a large number of restriction enzymes. Each blot contains DNA 
from a series of normal individuals and from a series of test cases. Southern blots 
displaying hybridizing fragments (differing in length from control DNA when probed 
with sequences near or including the E-cadherin locus) indicate a possible 
5 mutation. If restriction enzymes which produce very large restriction fragments are 
used, then pulsed field gel electrophoresis ("PFGE") can be employed. 

Detection of point mutations may be accomplished by molecular cloning of the E- 
cadherin allele(s) and sequencing that allele(s) using techniques well known in the 

10 art. Alternatively, the gene sequences can be amplified, using known polynucleotide 
amplification techniques, directly from a genomic DNA preparation from the sample 
tissue. The amplification techniques which can be used include methods such as 
the polymerase chain reaction (PCR), ligation amplification (or ligase chain reaction, 
LCR) and amplification methods based on the use of Q-beta replicase. These 

15 methods are well known and widely practised in the art. See, eg., US Patents 
4,683,195 and 4,683,202 and Innis et al, 1990 (for PCR); and Wu et al, 1989a (for 
LCR). Reagents and hardware for conducting amplification are commercially 
available. Primers useful to amplify sequences from the E-cadherin region are 
preferably complementary to, and hybridize specifically to sequences in the E- 

20 cadherin region or in regions that flank a target region therein. 

E-cadherin sequences generated by amplification may be sequenced directly. 
Alternatively, but less desirably, the amplified sequence (s) may be cloned prior to 
sequence analysis. A method for the direct cloning and sequence analysis of 
25 enzymatically amplified genomic segments has been described by Scharf, 1986. 

There are numerous well known methods for confirming the presence of a 
susceptibility allele. These include: 1) single stranded confirmation analysis 
("SSCA") (Orita et al, 1989); 2) denaturing gradient gel electrophoresis ("DGGE") 
30 (Wartell et al, 1990; Sheffield et al, 1989); 3) RNase protection assays (Finkelstein 
etal, 1990; Kinsler et al, 1991); 4) allele-specific oligonucleotides (ASO's) (Conner 
et al, 1983); 5) the use of proteins which recognize nucleotide mismatches, such 
as the E. coti mutS protein (Modrich, 1991); and 6) allele-specific PCR (Rano & Kidd, 
1989). For allele-specific PCR, primers are used which hybridize at their 3' ends to a 
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particular E-cadherin mutation. If the particular E-cadherin mutation is not 
present, an amplification product is not observed. 

Other approaches which can also be used include the Amplification Refractory 
5 Mutation System (ARMS), as disclosed in European Patent Application Publication 
No. 0332435 and in Newton et al } 1989. Insertions and deletions of genes can also 
be detected by cloning, sequencing and amplification. In addition, restriction 
fragment length polymorphism (RFLP) probes for the gene or surrounding marker 
genes can be used to detect alteration of an allele or an insertion in a polymorphic 
10 fragment. Such a method is particularly useful for screening relatives of an affected 
individual for the presence of the E-cadherin mutation found in that individual. 

In the first three methods (ie., SSCA, DGGE and RNase protection assay), a new 
electrophoretic band appears. SSCA detects a band which migrates differentially 

15 because the sequence change causes a difference in single- strand, intramolecular 
base pairing. RNase protection involves cleavage of the mutant polynucleotide into 
two or more smaller fragments. DGGE detects differences in migration rates of 
mutant sequences compared to wild-type sequences, using a denaturing gradient 
gel. In an allele-specific oligonucleotide assay, an oligonucleotide is designed which 

20 detects a specific sequence, and the assay is performed by detecting the presence or 
absence of a hybridization signal. In the mutS assay, the protein beings only to 
sequences that contain a nucleotide mismatch in a heteroduplex between mutant 
and wild- type sequences. 

25 Mismatches are hybridized nucleic acid duplexes in which the two strands are not 
100% complementary. Lack of total homology may be due to deletions, insertions, 
inversions or substitutions. Mismatch detection can be used to detect point 
mutations in the gene or its mRNA product. While these techniques are less 
sensitive than sequencing, they are simpler to perform on a large number of 

30 samples. 

An example of a mismatch cleavage technique is the RNase protection method. This 
method involves the use of a labeled riboprobe which is complementary to the 
human wild- type E-cadherin gene coding sequence. The riboprobe and either 
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mRNA or DNA isolated from the test tissue are annealed (hybridized) together and 
subsequently digested with the enzyme RNase A which is able to detect some 
mismatches in a duplex RNA structure. If a mismatch is detected by RNase A, it 
cleaves at the site of the mismatch. Thus, when the annealed RNA preparation is 
5 separated on an electrophoretic gel matrix, if a mismatch has been detected and 
cleaved by RNase A, an RNA product will be seen which is smaller than the full 
length duplex RNA for the riboprobe and the mRNA or DNA. 

The riboprobe need not be the full length of the E-cadherin mRNA or gene but can 
10 be a segment of either. If the riboprobe comprises only a segment of the E-cadherin 
mRNA or gene, it will be desirable to use a number of these probes to screen the 
whole mRNA sequence for mismatches. 

In similar fashion, DNA probes can be used to detect mismatches, through 
15 enzymatic or chemical cleavage. See, eg., Cotton et al, 1989; Shenk et al, 1975; 
Novack et al.,, 1986. Alternatively, mismatches can be detected by shifts in the 
electrophoretic mobility of mismatched duplexes relative to matched duplexes. See 
eg. Cariello, 1988. With either riboprobes or DNA probes, the cellular mRNA or DNA 
which might contain a mutation can be amplified using PCR before hybridization. 
20 Changes in DNA of the E-cadherin gene can also be detected using Southern 
hybridization, especially if the changes are gross rearrangements, such as deletions 
and insertions. 

DNA sequences of the E-cadherin gene which have been amplified by use of PCR 
25 may also be screened using allele-specific probes. These probes are nucleic acid 
oligomers, each of which contains a region of the E-cadherin gene sequence 
harboring a known mutation. For example, one oligomer may be about 30 
nucleotides in length, corresponding to a portion of the E-cadherin gene sequence. 
By use of a battery of such allele-specific probes, PCR amplification products can be 
30 screened to identify the presence of a previously identified mutation in the E- 
cadherin gene. 

Hybridization of allele-specific probes with amplified E-cadherin sequences can be 
performed, for example, on a nylon filter such as Hybond. Hybridization to a 
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particular probe under stringent hybridization conditions indicates the presence of 
the same mutation in the tumour tissue as in the allele-speciiic probe. 

Mutations from potentially susceptible patients falling outside the coding region of 
E-cadherin can be detected by examining the non-coding regions, such as introns 
and regulatory sequences near or within the E-cadherin gene. An early indication 
that mutations in noncoding regions are important may come from Northern blot 
experiments that reveal messenger RNA molecules of abnormal size or abundance 
in cancer patients as compared to control individuals. 

Alteration of E-cadherin mRNA expression can be detected by any techniques 
known in the art. These include Northern blot analysis, PCR amplification and 
RNase protection. Diminished mRNA expression indicates an alteration of the wild- 
type E-cadherin gene. Alteration of wild- type E-cadherin genes can also be detected 
by screening for alteration of wild-type E-cadherin protein. For example, 
monoclonal antibodies immunoreactive with wild-type E-cadherin can be used to 
screen a tissue with lack of bound antigen indicating an E-cadherin mutation. 

Monoclonal antibodies with affinities of 10 s M' 1 or preferably 10 9 to lO 10 M* 1 or 
stronger will typically be made by standard procedures as described, eg. in Harlow & 
Lane, 1988 or Goding, 1986. Briefly, appropriate animals will be selected and the 
desired immunization protocol followed. After the appropriate period of time, the 
spleens of such animals are excised and individual spleen cells fused, typically, to 
immortalised myeloma cells under appropriate selection conditions. Thereafter, the 
cells are clonally separated and the supernatants of each clone tested for their 
production of an appropriate antibody specific for the desired region of the antigen. 

Other suitable techniques for preparing antibodies involve in vitro exposure of 
lymphocytes to the antigenic polypeptides, or alternatively, to selection of libraries 
of antibodies in phage or similar vectors. See Huse et ah, 1989. 

Also, recombinant immunoglobulins may be produced using procedures known in 
the art (see, for example, US Patent 4,816,567). 
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The antibodies may be used with or without modification. Frequently, antibodies 
will be labeled by joining, either covalently or non-covalently, a substance which 
provides for a detectable signal. A wide variety of labels and conjugation techniques 
are known and are reported extensively in the literature. Suitable labels include 
radionuclides, enzymes, substrates, cofactors, inhibitors, fluorescent agents, 
chemiluminescent agents, magnetic particles and the like. Patents teaching the use 
of such labels include US Patents 3,817,837; 3,850,752; 3,939,350; 3,996,345; 
4,277,437; 4,275,149; and 4,366,241. 

Antibodies specific for products of mutant alleles could also be used to detect 
mutant E-cadherin gene product. Such antibodies can be produced in equivalent 
fashion to the antibodies for wild-type E-cadherin as described above. 

The immunological assay in which the antibodies are employed can involve any 
convenient format known in the art. Such formats include Western blots, 
immunohistochemical assays and ELISA assays. In addition, functional assays 
such as protein binding determinations, can also be used. 

In summary, any approach to detecting a germline alteration in the underlying DNA 
coding for wild-type E-cadherin expression can be employed, whether the analysis 
be of the DNA itself, mRNA transcribed from the DNA or the protein which is the 
ultimate expression product of the DNA. 

The following experimental sections outline the various analyses undertaken in 
detail. These identify a number of different mutations and are included for reasons 
of exemplification only. 

EXPERIMENTAL 

SECTION 1 - Familial gastric cancer in Maori kindreds (families A, B and C) 
Methods 

Genotyping: DNA extracted from blood and biopsy samples (Banerjee et ah, (1995)) 
was genotyped using standard conditions (Dib, (1996)) in reactions containing 0.2U 
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AmpliTaq Gold (Perkin Elmer) and 25 pmole of infrared labelled (IR41) forward 
primer (MWG Biotech). Products were analysed on a LiCor 4000L DNA sequencer. 

SSCP analysis: SSCP mutation analysis was carried out as described by Berx et al } 
(1995). The PCR products were electrophoresed at room temperature through a 6% 
non denaturing polyacrylamide gel without added glycerol. Products were detected 
by autoradiography. 

RT-PCR: Total RNA was extracted (Chomczynski et al, (1987)) from frozen biopsy 
material and reverse transcribed using Superscript II (Gibco BRL) according to the 
manufacturers instructions. Nucleotide position 1008 was PCR- amplified from the 
cDNA using a forward primer within exon 7 (5'-TAA CAG GAA CAC AGG AGT CAT 
CA-3') and a reverse primer from exon 8 (5*-GTG GTG GGA TTG AAG ATC GG-3'). 
Reactions contained 4mM MgCl 2 and 0.2U AmpliTaq Gold and were cycled as 
follows: (95°C 10 min) 1 cycle and (95°C 15 sec, 57°C 45 sec, 72°C 10 sec) for 35 
cycles. 

Plasmid and direct sequencing: RT-PCR products were eluted from a 6% 
polyacrylamide denaturing gel, re-amplified with the original primers using Pivo 
polymerase (Boehringer Mannheim) and ligated into the EcoRV site of Bluescript. 
Template for direct sequencing of mutations was produced from genomic DNA by 
PCR using the SSCP antisense primers and the sense primers 8 (Berx et al, 1995) 
with an added 5' leader corresponding to the T3 sequencing primer. Plasmid and 
direct sequencing were carried out using Thermosequenase (Amersham) and an 
IR41 labelled (MWG Biotech) T3 primer (3 pmoles/reaction). The products were 
analysed on a LiCor 4000L DNA sequencer. 

Linkage analysis: Two point lod scores were calculated using MLINK of the 
LINKAGE 5.1 package (Lathrop et al, (1985)). A gene frequency of 10~ 4 was assumed 
for the disease gene. Age dependent penetrance was taken into account; seven 
liability classes were obtained from the cumulative age of onset curve: 0.18 for 
individuals from 0-20yrs, 0.24 (21-25 years), 0.34 (26-30 years), 0.48 (31-35 years), 
0.56 (36-40 years), 0.64 (41-45 years) and 0.70 (>46 years). Variation of the 
maximum penetrance from 60-80% did not change the significance of the results. 
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Results 

Linkage analysis 

Reference should be made to Table 1 below which relates to family A. 



Table 1. Two point lod scores for linkage of the gastric cancer 
susceptibility gene to markers mapping to the genetic 
interval containing E-cadherin 



Marker 


Lod scores 


Recombination 








fraction (q) 




Equal allele 


Kindred allele 




frequencies 


frequencies 




D16S752* 


5.04 


4.04 


0 


D16S3043** 


2.01 


2.34 


0.05 


D16S3019** 


2.28 


1.57 


0 


D16S3095** 


4.90 


4.07 


0 


D16S3083** 


2.79 


2.16 


0 


D16S3138** 


3.32 


2.68 


0 



5 

Lod scores were calculated assuming either equal allele frequencies or, in a 
conservative approach (in the absence of allele frequencies for the study 
population), using the actual frequencies observed in the study kindred. 

10 * GDB (TM) Human Genome Database, Baltimore (Maryland, USA): John 

Hopkins University. 
Dib, 1996. 

The linkage analysis found a maximum two-point lod score (Zmax=5.04, 0=0) with 
15 marker D16S752, which maps within the genetic interval on chromosome 16q22.1 
containing the E-cadherin gene (GDB, Human Genome Database, Baltimore, 
Maryland, USA; John Hopkins University. Genotyping of five other markers (Dib, 
(1996)) in the vicinity of E-cadherin identified additional significantly linked 
markers. A conserved haplotype spanning 9 centimorgans from D16S3019 to 
20 D16S3138 was consistently inherited with the disease. This haplotype was also 
present in all obligate carriers of the susceptibility gene and a proportion of the 
unaffected individuals. The proportion of individuals with this haplotype who were 
affected by the age of 60 provided an approximation of 70% for the penetrance of 
the susceptibility gene in this kindred. 
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Mutation analysis - family A 

Mutation analysis of samples from the kindred of Figure 2 (family A) using the 
single-stranded conformational polymorphism (SSCP) technique (Berx et al, 1995) 
5 revealed a band-shift in exon 7 (Fig. 4a) in DNA extracted from lymphocytes of two 
affected people and four obligate carriers of the susceptibility gene. Direct 
sequencing of exon 7 identified a G-T transversion at the last nucleotide (position 
1008) of this exon (Fig. 4b). The SSCP band- shift was not observed in 150 unrelated 
chromosomes (data not shown). 

10 

Discussion - exon 7 mutation 

The consequences of the mutation G-T transversion in exon 7 are two-fold. Firstly, 
the mutated nucleotide forms part of the splice donor site consensus sequence 
(Padgett et al, (1986)). Mutation of this splice site position results in exon skipping 

15 and the activation of cryptic splice sites (Andrews et al, (1982); KuiVaniem et al, 
(1995)). Mutation of E-cadherin nucleotide 1008 (a G to A transition) has been 
observed previously in a cell line derived from a histologically diffuse gastric 
carcinoma (Kato III) (Oda et al, (1994)). This mutation resulted in the activation of 
cryptic splice sites which led to premature chain termination. To determine the 

20 extent to which transcript carrying the G to T transversion was incorrectly spliced, 
exon-Hnking RT-PCR (exons 7-8) was performed on stomach biopsy material taken 
from an affected family member. In addition to the expected product of 180 bp, a 
minor 187 bp band was also observed. Both products were cloned and resulting 
clones sequenced. 10/10 clones derived from the larger band contained the 

25 mutation and a 7 bp insertion of intronic DNA. The insertion is a consequence of 
splicing at a cryptic splice site (Oda et al, (1994)). Since transcript which is 
incorrectly spliced at exon 7 is unstable in vivo, the extent of aberrant splicing was 
estimated from the proportion of correctly spliced transcript which contained the G 
to T mutation. 1/14 clones derived from the 180 bp product contained the mutation. 

30 This result demonstrates that, relative to the wild- type transcript, only about 15% 
of the mutant transcript accumulates in stomach tissue. 

The second consequence of the G-T transversion is the substitution of Glu 336 with 
Asp (Berx et al, (1995)). Glu 336 is located in one of the LDRE motifs which form 
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part of E-cadherin' s four calcium binding pockets. Calcium binding is required for 
dimerisation and rigidification of E-cadherin and provides protection from 
proteolytic degradation (Nagar et al } (1996)). Molecular modelling indicates that an 
Asp at position 336 would cause a significant deformation in the calcium binding 
5 pocket with a probable negative effect on its ability to bind calcium (data not 
shown). The fact that the LDRE motif is conserved, not only amongst vertebrates 
but also in Drosophila (Mahoney et al } (1991)), suggests that a Glu to Asp mutation 
at this position is not tolerated. 

10 Mutation analysis - confirmatory (families B and C) 

To confirm the role of E-cadherin in inherited gastric cancer susceptibility, germ line 
mutations in this gene were searched for in two other Maori families (families B and 
C) with early-onset, histologically diffuse gastric cancer. SSCP analysis of exons 2- 
16 amplified from lymphocyte DNA was carried out on two affected individuals and 

15 one obligate carrier from family B (Fig. 5) and the proband of family C. A band shift 
was observed in exon 15 in the three members of family B who were tested. Direct 
sequencing of exon 15 showed that all three individuals were heterozygous for the 
insertion of an additional C residue in a run of five cytosines at positions 2,382- 
2,386 (Fig. 6A). The resulting frameshift leads to an E-cadherin molecule lacking 

20 about half of its cytoplasmic domain. 

The proband of family C (aged 30 years) showed an SSCP band in exon 13. Direct 
sequencing identified a heterozygous C T transition at nucleotide 2,095 which 
converted Gin 699 to a TAG stop codon (Fig. 6B). This inactivating mutation would 
25 result in an expressed E-cadherin peptide lacking both the transmembrane and 
cytoplasmic domain. 

Mutation Summary - families A, B and C 

The exemplary mutations identified to date in the three Maori kindreds are 
30 summarised in Table 2. In addition to the inactivating mutations in families A, B 
and C, two silent mutations and one missense mutation which did not segregate 
with the phenotype were found (Table 2). 
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Table 2. E-cadherin germline mutations and polymorphisms in Maori 
gastric cancer families 



Family Nucleotide position 
(exon) 



Mutation 



Type 



A 
B 
C 



1,008 (7) 
2,382-2,386 (15) 
2,095 (13) 



G — > T Splice site 

C insertion Frameshift 

C -> T Premature Termination (TAG) 



10 



B 

A, C 
A, B, C 



1,409 (10)* 
intron 12 f 
2,076 (13) 



C ~>T 
C ->T 
C ->T 



Codon 470: Thr -» He 
Silent 
Silent 



5 * This mutation did not segregate with the disease in family B. 
t Located 13 nucleotides upstream of the exon. 



SECTION 2 - Familial gastric cancer in non-Maori kindreds 



Material and Methods 



Description of families 

Family 1000 is of mixed Northern European ancestry (Fig. 7a). The proband and her 
15 mother were both diagnosed with high grade adenocarcinoma with signet ring 
histology and linitis plastica at ages 40 and 48, respectively. The proband's 
maternal grandfather had died of cancer of unknown type at age 45. A maternal 
aunt was diagnosed at age 59 with a scirrhous adenocarcinoma of the left breast. At 
age 63 she also had resection of an adenocarcinoma of the cardia of the stomach. 
20 Microscopic examination of the gastric tumour showed a diffuse, poorly 
differentiated mucous producing adenocarcinoma with numerous signet ring cells. 



Family 4201 (Fig. 7b) is of European origin. The family has a strong history of gastric 
and breast cancer and leukemia. Pathology specimens were available from three of 
25 four individuals affected by gastric cancer (III- 1 , III-2, III- 5). These three cancers 
were all diffusely infiltrative signet ring adenocarcinomas (Watanabe et al, (1990)). 
Extensive thickening of the stomach wall, consistent with linitis plastica, was 
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described in one case (III- 1) . The age at diagnosis of gastric cancer in this family 
ranged from 37 to 46 years and the age at death ranged from 39 to 55 years. One 
obligate carrier is unaffected by cancer at age 71 years. However, her sister (II-2) 
was diagnosed with gastric cancer at age 37 and breast cancer two years later. Two 
5 cases of breast cancer alone, and one of Kaposi's sarcoma in the brain (associated 
with HIV infection) have occurred in this family, with ages at diagnosis of 39, 46 and 
40 years, respectively. The histology of these tumours was unavailable. In addition, 
three family members had unspecified leukemia diagnosed at ages 66, 45, and 45 
years. A fourth case of leukemia occurred in a spouse at age 83 years. 

10 

Family CHG 72 is of African American origin and has had four family members 
affected by gastric cancer. The age of diagnosis of the cancers was 25 to 58 with the 
patients dying between ages 29 and 58. The tumours were all diffuse, poorly 
differentiated infiltrative adenocarcinomas with signet ring histology. In addition to 
15 these four cases, a half sister (II- 1) to the proband died of an unconfirmed cancer in 
her thirties and a child (IV- 1) currently aged 10 years suffers from aplastic anemia. 
The father of the four affected siblings (1-1) died at age 74 of an unknown illness. 

DNA manipulation 

20 DNA was extracted from blood using either standard techniques or the Puregene kit 
(Gentra Systems, Minneapolis, Minnesota) following the manufacturer's protocol. 
DNA extractions from paraffin-embedded, formalin-fixed tissue were carried out 
using previously reported techniques (Greer et at, (1995); Grady et ah, (1998)). All 
tumours from family 4201 and family CHG 72 were microdissected prior to DNA 

25 extraction. PCR products for the 16 E-cadherin exons were amplified using 1U 
AmpliTaq Gold (Perkin Elmer) and the primers and conditions described by Berx et 
al (1996). A 5' leader corresponding to the T3 sequencing primer was added to the 
sense primer. Direct sequencing of PCR products was carried out using 
Thermosequenase (Amersham) and an IR800 labelled (MWG Biotech) T3 primer (3 

30 pmoles/reaction). The products were analysed on a LiCor 4000L DNA sequencer. 



Confirmation of the E-cadherin mutations in family 4201 and family CHG 72 was 
performed on PCR products from the genomic DNA extracted from lymphocytes or 
microdissected, paraffin embedded tumour tissue using the Amplicycle kit (Perkin 
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Elmer) with aP 33 -dCTP random priming. To improve the efficiency of PCR 
amplification of exon 2 when using microdissected tumour DNA, the exon 2 primers 
were redesigned to amplify a shorter PCR product which contained the region of 
interest. These primers, 5'-TTC CCC CAC CCC AGG TCT C-3' (EX2F) and 5-CCC 
5 TCA CCT CTG CCC AGG AC-3' (EX2R), correspond to nucleotides 1-19 and 136-117 
of the exon 2 genomic sequence (accession # L34937), respectively. Sequencing was 
performed using either EX2F or the primer 5'-TGT AGC TCT CGG CGT CAA AG-3' 
(complementary to nucleotides 93-1 12 of the E-cadherin cDNA sequence (Berx et al, 
1995)). The sequencing products were electrophoresed on a 6% polyacrylamide 7M 
10 urea gel at 70W (50°C) for 50-90 minutes and visualized using either 
autoradiography or a Storm 820 Phosphorimager (Molecular Dynamics). 

Results 

15 Mutation Searching 

All 16 E-cadherin exons were PCR amplified and sequenced. Sequencing of 
peripheral white blood cell DNA from the proband of family 1000 identified the 
heterozygous insertion of an additional cytosine after position 1588 (1588insC) in 
exon 11 (Fig. 8a). This frameshift mutation is predicted to lead to premature 

20 translation termination in exon 11. The truncated peptide would lack both one third 
of the extracellular domain and the entire intracellular domain of the wild-type E- 
cadherin protein. The heterozygous 1588insC mutation was also identified in DNA 
from the proband's mother who had gastric cancer. This DNA had been extracted 
from a biopsy of a metastasis to the diaphragm. The biopsy consisted of a mixture of 

25 diffusely infiltrating tumour cells and normal stroma. 

Sequencing genomic DNA from peripheral white blood cells of the proband of family 
4201 (II- 1) identified a heterozygous G->T transversion at nucleotide 70 (70G->T) in 
exon 2 (Fig. 8b). The proband is unaffected but is an obligate carrier of the 
30 predisposing mutation. The mutation would convert a glutamic acid (Glu24) to a 
TAG stop codon in the signal peptide of the E-cadherin precursor protein. This 
mutation was also identified in microdissected normal tissue from gastric biopsies of 
three siblings (III- 1 , III-2, III-5) with gastric cancer, and peripheral white blood cell 
DNA from an unaffected sibling (III-4) and a first cousin (III-7) affected by breast 
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cancer. DNA from blood of one unaffected family sibling (III-6) showed no mutation. 
No other biological samples were available from any of the other family members. 

The E-cadherin gene was PCR amplified using peripheral white blood cell DNA from 
5 the proband of family CHG 72 (II-4). Sequencing identified a heterozygous G->A 
transition in the splice donor site of intron 8 (1 137+ 1G->A). Guanine at the +1 
position of the splice consensus sequence is 100% conserved in eukaryotic splice 
sites (Padgett et al, (1986)). The G->A change would be predicted to result in either 
skipping of exon 8 or the activation of cryptic splice sites. This mutation was 
10 identified in DNA from normal and microdissected tumour tissue from paraffin 
blocks in three additional affected family members (II-2, II-3, III- 1) . Loss of 
heterozygosity (LOH) analysis using the micro satellite repeat markers D16S3138, 
D16S3019, and D16S3043 was performed on the microdissected tumours from 
family CHG 72 and family 4201 but failed to show LOH. 

15 

In addition to the three mutations identified in these families, two silent 
polymorphisms were also identified in family 1000. The mutations and 
polymorphisms are summarized in Table 3. 

20 TABLE 3. Summary of mutations and polymorphisms identified in diffuse 

gastric cancer families. 



Family 


Mutation/ 
Polymorphism 


Type 


Location 


1000 


1588insC 


Frameshift 


exon 11 


4201 


70G->T (E24X) 


Nonsense 


exon 2 


CHG 72 


1137+1G->A 


Donor splice site 


intron 8 


1000 


2076C->T 


Silent 


exon 13 


1000 


1937-27T->G 


Silent 


intron 12 



Table 3. Mutations and polymorphisms were identified by direct sequencing of the 
25 16 E-cadherin exons. Nucleotide positions are as described in Berx et al,. (1995). 
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Discussion 

Inactivating germline mutations in the E-cadherin gene have been identified in three 
of three US families of Caucasian and African American descent with histories 
consistent with an autosomal dominant susceptibility to diffuse gastric cancer 
5 (Lauren (1965)). Tumours in each of the three families were histologically defined as 
signet ring adenocarcinomas (Watanabe et al, (1990)). These results, taken with the 
earlier identification of E-cadherin germline mutations in three of three New 
Zealand Maori families as reported in Section 1 (who are Polynesian in origin) 
demonstrate that mutation of E-cadherin is a widespread determinant of inherited 
10 susceptibility to diffuse gastric cancer, and its occurrence is independent of ethnic 
origin. Germline E-cadherin mutation therefore genotypically defines an inherited 
cancer syndrome. This syndrome is designated herein as hereditary diffuse gastric 
cancer (HDGC). 

15 The high incidence of early-onset breast cancer and unspecified leukemia in family 
4201 suggests that non-gastric malignancies may also be associated with HDGC. In 
addition, one presumed gene carrier in family 1000 had breast cancer prior to 
developing gastric cancer. E-cadherin mutations have been described in over 50% of 
sporadic lobular breast cancers (but not in other histopathological subtypes) (Berx 

20 et al, (1996)), suggesting that mutation of the E-cadherin gene is required for the 
onset or progression of this type of cancer. It is notable that of the six families with 
E-cadherin mutations described above only one (#4201) has an extensive history of 
cancer at sites other than the stomach. Members of that family carry truncating 
mutations in the sequence encoding either the E-cadherin signal peptide or the 

25 precursor sequence. The remaining mutations would be predicted to result in 
truncated proteins containing at least part of the extracellular domain including the 
HAV motif required for E-cadherin homophilic adhesion. Decapeptides containing 
this motif are capable of inhibiting E-cadherin-mediated cell adhesion (Blaschuk et 
al, (1990)). 

30 

INDUSTRIAL APPLICATION 

The above results demonstrate the role that germline mutations in the gene 
encoding E-cadherin play in susceptibility to cancer, particularly HDGC. Further, 
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the high frequency of inactivation of the E-cadherin gene in many types of sporadic 
tumours (Mareel et al y (1995)) suggests that mutations in this gene may also confer 
inherited susceptibility to other cancers. These include cancers of the breast, 
prostate, thyroid, liver, kidney, bladder and colon. 

The demonstration that mutations in the gene encoding E-cadherin are predictive of 
cancer susceptibility has a number of implications. As indicated above, the primary 
implication is in a method of detection of a risk of a predisposition to cancer. 

Early at-risk determination provides the opportunity for early intervention. Carriers 
of the mutation could choose to have prophylactic surgery or chemopreventative 
treatment prior to the appearance of any malignancy. Testing also enables carriers 
to make important life decisions (eg. child bearing) and will provide the opportunity 
for pre-natal diagnosis. For non-carriers, testing will bring peace of mind and will 
remove the need for surveillance. 

The present invention will therefore mean that people from families with histories of 
familial cancer (such as HDGC) will be able to undergo tests which will search for 
the presence of E-cadherin gene mutations. 
20 

The identification of E-cadherin as a cancer susceptibility gene has implications 
beyond early detection. The possibility of chemopreventative approaches to delay 
the onset of cancer is also raised. These approaches, which are based on the 
activity of the second copy of the E-cadherin gene, fall into two categories: (a) 
25 procedures to maintain the expression of the remaining normal E-cadherin gene 
and (b) procedures to minimise the risk of mutation or loss of the normal allele. 

(a). Other than mutation, a number of mechanisms for down-regulating E-cadherin 
expression are known. These mechanisms are either normal physiological responses 
30 such as occur during wound repair or may be consequences of a disease process, as 
is suggested by the hypermethylation of the E-cadherin gene in a proportion of 
sporadic tumours. There is also evidence suggesting that E-cadherin can be stored 
in the cell, possibly in an inactive form. Activation of one or more of these pathways 
in a person already carrying a mutation in the gene may diminish the concentration 
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of E-cadherin below the minimum threshold to maintain normal cell adhesion. Since 
tumourigenesis is a multi-step pathway, under- expression of E-cadherin in a cell 
which has already acquired mutations in other tumour suppressor genes or 
oncogenes will accelerate the onset of disease. 

5 

Compounds which can increase the expression, or prevent the decrease, of E- 
cadherin would be potential cancer chemopreventative agents for carriers of 
mutations in this gene. A number of chemicals are already known to up- regulate E- 
cadherin: 

10 

Insulin-like growth factor- 1 
9-czs-retinoic acid and all- trans-r ctinoic acid 
tangeretin 
tamoxifen 
15 y-linolenic acid 
calcium 
relaxin 
17-p estradiol 

20 Alternatively, compounds which prevent wounding in the stomach, such as anti- 
ulcer treatments, would be predicted to have a protective effect. 

(b). Preventing loss of the second E-cadherin allele or other genes involved in the 
pathway to tumourigenesis will delay the onset of cancer in carriers. Tissue which is 

25 inflamed, or undergoing rapid regeneration is more likely to acquire a mutation. 
Treatments which prevent inflammation or the need for tissue repair should have a 
protective effect. Therefore compounds which prevent gastritis, antibiotics which 
eradicate the bacteria Helicobacter pylori (which causes inflammation and tissue 
damage), and anti- ulcer treatments would all offer protection from additional 

30 mutations. 

There is also the possibility of a curative or corrective approach using gene therapy. 
This will involve supplying wild- type E-cadherin function to an individual who 
carries mutant E-cadherin alleles. Supplying such a function should suppress 
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neoplastic growth of the recipient cells. The wild-type E-cadherin gene or a part of 
the gene may be introduced into cells within such an individual in a vector such 
that the gene remains extrachromosomal In such a situation, the gene will be 
expressed by the cell from the extrachromosomal location. If a gene portion is 
5 introduced and expressed in a cell carrying a mutant E-cadherin allele, the gene 
portion should encode a part of the E-cadherin protein which is required for non- 
neoplastic growth of the cell. More usual is the situation where the wild-type E- 
cadherin gene or a part thereof is introduced into the mutant cell in such a way that 
it recombines with the endogenous mutant E-cadherin gene present in the cell. 

10 Such recombination requires a double recombination event which results in the 
correction of the E-cadherin gene mutation. Vectors for introduction of genes both 
for recombination and for extrachromosomal maintenance are known in the art, and 
any suitable vector may be used. Methods for introducing DNA into cells such as 
electroporation, calcium phosphate co-precipitation and viral transduction are 

15 known in the art. Cells transformed with the wild-type E-cadherin gene can be used 
as model systems to study cancer remission and drug treatments which promote 
such remission. 

As generally discussed above, the wild-type E-cadherin gene or fragment, where 
20 applicable, may be employed in gene therapy methods in order to increase the 
amount of the expression products of such genes in cancer cells. Such gene 
therapy is particularly appropriate for use in pre-cancerous cells, in which the level 
of E-cadherin polypeptide is absent or diminished compared to normal cells. It may 
also be useful to increase the level of expression of a given E-cadherin gene even in 
25 those cells in which the mutant gene is expressed at a "normal" level, but the gene 
product is not fully functional. 

Gene therapy would be carried out according to generally accepted methods, for 
example as described by Kren et ah, (1998), or as described by Friedman in Therapy 
30 for Genetic Disease, T. Friedman, ed., Oxford University Press (1991), pp 105-121. 
Cells from a patient would be first analyzed by the methods described above, to 
ascertain the production of E-cadherin polypeptide. A virus or plasmid vector, 
containing a copy of the E-cadherin gene linked to expression control elements and 
capable of replicating inside the target cells, is prepared. Suitable vectors are 
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known, such as disclosed in US Patent 5,252,479 and PCT published application 
WO 93/07282. The vector is then injected into the patient, either locally at the site 
of the target cells or systemically (in order to reach any target cells that may be at 
remote sites). If the transfected gene is not permanently incorporated into the 
genome of each of the targeted cells, the treatment may have to be repeated 
periodically. 

Gene transfer systems known in the art may be useful in the practice of the gene 
therapy methods. These include viral and nonviral transfer methods. A number of 
viruses have been used as gene transfer vectors, including papovaviruses (eg. SV40, 
Madzak et al, (1992)), adenovirus (Berkner (1992)), vaccinia virus (Moss (1992)), 
adeno-associated virus (Mu2yczka (1992)), herpesviruses including HSV and EBV 
(Margolskee (1992); Johnson et al, (1992); Fink et al, (1992); Breakfield and Geller, 
(1987); Freese et al, (1990)), and retroviruses of avian (Petropoulos et al, (1992), 
murine (Miller (1992)); and human origin (Shimada et al, (1991); Helseth et al, 
(1990); Page et al, (1990); Buchschacher and Panganiban (1992)). 

Nonviral gene transfer methods known in the art include chemical techniques such 
as calcium phosphate coprecipitation (Pellicer et al, (1980)); mechanical techniques, 
for example microinjection (Anderson et al, (1980)); membrane fusion-mediated 
transfer via liposomes (Lim et al, (1992)); and direct DNA uptake and receptor- 
mediated DNA transfer (Wolff et al, (1990); Wu et al, (1991)). Viral-mediated gene 
transfer can be combined with direct in vivo gene transfer using liposome delivery, 
allowing one to direct the viral vectors to the target cells. Alternatively, the 
retroviral vector producer cell line can be injected into the patient (Culver et al, 
1992). Injection of producer cells would then provide a continuous source of vector 
particles. 

In an approach which combines biological and physical gene transfer methods, 
plasmid DNA of any size is combined with a polylysine-conjugated antibody specific 
to the adenovirus hexon protein, and the resulting complex is bound to an 
adenovirus vector. The trimolecular complex is then used to infect cells. The 
adenovirus vector permits efficient binding, internalization, and degradation of the 
endosome before the coupled DNA is damaged. 
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Liposome/DNA complexes have been shown to be capable of mediating direct in vivo 
gene transfer. While in standard liposome preparations the gene transfer process is 
nonspecific, localized in vivo uptake and expression have been reported in tumour 
5 deposits, for example, following direct in situ administration (Nabel, 1992). 

Corrective efforts need not always involve gene therapy. Peptides which have wild- 
type E-cadherin activity can be supplied to cells which carry mutant or missing E- 
cadherin alleles as an alternative approach to gene therapy. Such peptides can be 
produced by expression of the cDNA sequence in bacteria, for example, using 
known expression vectors and known techniques (Sam brook et al, (1989)). 
Alternatively, E-cadherin polypeptide can be extracted from E-cadherin- producing 
mammalian cells. In addition, the techniques of synthetic chemistry can be 
employed to synthesize E-cadherin protein (Merryfield, (1963)). 

Active E-cadherin molecules can be introduced into cells by microinjection or by use 
of liposomes, for example. Alternatively, some active molecules may be taken up by 
cells, actively or by diffusion. Extracellular application of the E-cadherin gene 
product may be sufficient to prevent tumour growth. Supply of molecules with E- 
cadherin activity should lead to partial reversal of the risk of a later neoplastic state. 
Other molecules with E-cadherin activity (for example, peptides, drugs or organic 
compounds) may also be used to effect such a reversal. Modified polypeptides 
having substantially similar function can also be used for peptide therapy. 

25 Still another implication of the applicant's finding is that cells which carry a mutant 
E-cadherin allele can be used as model systems to study and test for substances 
which have potential as prophylactic /therapeutic agents. The cells are typically 
cultured epithelial cells. These may be isolated from individuals with E-cadherin 
mutations, either somatic or germline. Alternatively, the cell line can be engineered 

30 to carry the mutation in the E-cadherin allele. After a test substance is applied to 
the cells, the neoplastically transformed phenotype of the cell is determined. Any 
trait of neoplastically transformed cells can be assessed, including anchorage- 
independent growth, tumourigenicity in nude mice, invasiveness of cells, and 
growth factor dependence. Assays for each of these traits are known in the art. 
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Those persons skilled in the art will appreciate that the above description is 
provided by way of example only and that it is limited only by the lawful scope of the 
appended claims. 

5 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: University of Otago 

Te Wheta Whanau Trust Limited 

(ii) TITLE OF INVENTION: GERMLINE MUTATIONS IN THE E-CADHERIN 

GENE AND METHOD FOR DETECTING PREDISPOSITIONS TO 
CANCER 

(iii) NUMBER OF SEQUENCES: 7 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESS: Russell McVeagh West- Walker 

(B) STREET: The Todd Building, Cnr Brandon Street and Lambton 

Quay 

(C) CITY: Wellington 

(D) COUNTRY: New Zealand 

(v) COMPUTER READABLE FORM 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: Windows 95 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: NZ 328994 

(B) FILING DATE: 17 October 1997 

(vii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Bennett, Michael Roy 

(B) REFERENCE/ DOCKET NUMBER: 23677 MRB 

(viii) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 64 4 499 9058 

(B) TELEFAX: 64 4 499 9306 



(2) INFORMATION FOR SEQ ID NO. 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4778 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO. 1: 

gcttgcggaa gtcagttcag actccagccc gctccagccc ggcccgaccc gaccgcaccc 60 

ggcgcctgcc ctcgctcggc gtccccggcc agccatgggc ccttggagcc gcagcctctc 120 

ggcgctgctg ctgctgctgc aggtctcctc ttggctctgc caggagccgg agccctgcca 180 

ccctggcttt gacgccgaga gctacacgtt cacggtgccc cggcgccacc tggagagagg 240 

ccgcgtcctg ggcagagtga attttgaaga ttgcaccggt cgacaaagga cagcctattt 300 

ttccctcgac acccgattca aagtgggcac agatggtgtg attacagtca aaaggcctct 360 

acggtttcat aacccacaga tccatttctt ggtctacgcc tgggactcca cctacagaaa 420 

gttttccacc aaagtcacgc tgaatacagt ggggcaccac caccgccccc cgccccatca 480 

ggcctccgtt tctggaatcc aagcagaatt gctcacattt cccaactcct ctcctggcct 540 

cagaagacag aagagagact gggttattcc tcccatcagc tgcccagaaa atgaaaaagg 600 

cccatttcct aaaaacctgg ttcagatcaa atccaacaaa gacaaagaag gcaaggtttt 660 

ctacagcatc actggccaag gagctgacac accccctgtt ggtgtcttta ttattgaaag 720 

agaaacagga tggctgaagg tgacagagcc tctggataga gaacgcattg ccacatacac 780 

tctcttctct cacgctgtgt catccaacgg gaatgcagtt gaggatccaa tggagatttt 840 

gatcacggta accgatcaga atgacaacaa gcccgaattc acccaggagg tctttaaggg 900 

gtctgtcatg gaaggtgctc ttccaggaac ctctgtgatg gaggtcacag ccacagacgc 960 

ggacgatgat gtgaacacct acaatgccgc catcgcttac accatcctca gccaagatcc 1020 

tgagctccct gacaaaaata tgttcaccat taacaggaac acaggagtca tcagtgtggt 1080 

caccactggg ctggaccgag agagtttccc tacgtatacc ctggtggttc aagctgctga 1 140 

ccttcaaggt gaggggttaa gcacaacagc aacagctgtg atcacagtca ctgacaccaa 1200 

cgataatcct ccgatcttca atcccaccac gtacaagggt caggtgcctg agaacgaggc 1260 

taacgtcgta atcaccacac tgaaagtgac tgatgctgat gcccccaata ccccagcgtg 1320 

ggaggctgta tacaccatat tgaatgatga tggtggacaa tttgtcgtca ccacaaatcc 1380 

agtgaacaac gatggcattt tgaaaacagc aaagggcttg gattttgagg ccaagcagca 1440 

gtacattcta cacgtagcag tgacgaatgt ggtacctttt gaggtctctc tcaccacctc 1500 

cacagccacc gtcaccgtgg atgtgctgga tgtgaatgaa gcccccatct ttgtgcctcc 1560 

tgaaaagaga gtggaagtgt ccgaggactt tggcgtgggc caggaaatca catcctacac 1620 

tgcccaggag ccagacacat ttatggaaca gaaaataaca tatcggattt ggagagacac 1680 

tgccaactgg ctggagatta atccggacac tggtgccatt tccactcggg ctgagctgga 1740 

cagggaggat tttgagcacg tgaagaacag cacgtacaca gccctaatca tagctacaga 1800 

caatggttct ccagttgcta ctggaacagg gacacttctg ctgatcctgt ctgatgtgaa 1860 

tgacaacgcc cccataccag aacctcgaac tatattcttc tgtgagagga atccaaagcc 1920 

tcaggtcata aacatcattg atgcagacct tcctcccaat acatctccct tcacagcaga 1980 
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actaacacac ggggcgagtg ccaactggac cattcagtac aacgacccaa cccaagaatc 2040 

tatcattttg aagccaaaga tggccttaga ggtgggtgac tacaaaatca atctcaagct 2 100 

catggataac cagaataaag accaagtgac caccttagag gtcagcgtgt gtgactgtga 2160 

aggggccgcc ggcgtctgta ggaaggcaca gcctgtcgaa gcaggattgc aaattcctgc 2220 

cattctgggg attcttggag gaattcttgc tttgctaatt ctgattctgc tgctcttgct 2280 

gtttcttcgg aggagagcgg tggtcaaaga gcccttactg cccccagagg atgacacccg 2340 

ggacaacgtt tattactatg atgaagaagg aggcggagaa gaggaccagg actttgactt 2400 

gagccagctg cacaggggcc tggacgctcg gcctgaagtg actcgtaacg acgttgcacc 2460 

aaccctcatg agtgtccccc ggtatcttcc ccgccctgcc aatcccgatg aaattggaaa 2520 

ttttattgat gaaaatctga aagcggctga tactgacccc acagccccgc cttatgattc 2580 

tctgctcgtg tttgactatg aaggaagcgg ttccgaagct gctagtctga gctccctgaa 2640 

ctcctcagag tcagacaaag accaggacta tgactacttg aacgaatggg gcaatcgctt 2700 

caagaagctg gctgacatgt acggaggcgg cgaggacgac taggggactc gagagaggcg 2760 

ggccccagac ccatgtgctg ggaaatgcag aaatcacgtt gctggtggtt tttcagctcc 2820 

cttcccttga gatgagtttc tggggaaaaa aaagagactg gttagtgatg cagttagtat 2880 

agctttatac tctctccact ttatagctct aataagtttg tgttagaaaa gtttcgactt 2940 

atttcttaaa gctttttttt ttttcccatc actctttaca tggtggtgat gtccaaaaga 3000 

tacccaaatt ttaatattcc agaagaacaa ctttagcatc agaaggttca cccagcacct 3060 

tgcagatttt cttaaggaat tttgtctcac ttttaaaaag aaggggagaa gtcagctact 3120 

ctagttctgt tgttttgtgt atataatttt ttaaaaaaaa tttgtgtgct tctgctcatt 3 180 

actacactgg tgtgtccctc tgcctttttt ttttttttta agacagggtc tcattctatc 3240 

ggccaggctg gagtgcagtg gtgcaatcac agctcactgc agccttgtcc tcccaggctc 3300 

aagctatcct tgcacctcag cctcccaagt agctgggacc acaggcatgc accactacgc 3360 

atgactaatt ttttaaatat ttgagacggg gtctccctgt gttacccagg ctggtctcaa 3420 

actcctgggc tcaagtgatc ctcccatctt ggcctcccag agtattggga ttacagacat 3480 

gagccactgc acctgcccag ctccccaact ccctgccatt ttttaagaga cagtttcgct 3540 

ccatcgccca ggcctgggat gcagtgatgt gatcatagct cactgtaacc tcaaactctg 3600 

gggctcaagc agttctccca ccagcctcct ttttattttt ttgtacagat ggggtcttgc 3660 

tatgttgccc aagctggtct taaactcctg gcctcaagca atccttctgc cttggccccc 3720 

caaagtgctg ggattgtggg catgagctgc tgtgcccagc ctccatgttt taatatcaac 3780 

tctcactcct gaattcagtt gctttgccca agataggagt tctctgatgc agaaattatt 3840 

gggctctttt agggtaagaa gtttgtgtct ttgtctggcc acatcttgac taggtattgt 3900 

ctactctgaa gacctttaat ggcttccctc tttcatctcc tgagtatgta acttgcaatg 3960 

ggcagctatc cagtgacttg ttctgagtaa gtgtgttcat taatgtttat ttagctctga 4020 
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agcaagagtg atatactcca ggacttagaa tagtgcctaa agtgctgcag ccaaagacag 4080 

agcggaacta tgaaaagtgg gcttggagat ggcaggagag cttgtcattg agcctggcaa 4140 

tttagcaaac tgatgctgag gatgattgag gtgggtctac ctcatctctg aaaattctgg 4200 

aaggaatgga ggagtctcaa catgtgtttc tgacacaaga tccgtggttt gtactcaaag 4260 

cccagaatcc ccaagtgcct gcttttgatg atgtctacag aaaatgctgg ctgagctgaa 4320 

cacatttgcc caattccagg tgtgcacaga aaaccgagaa tattcaaaat tccaaatttt 4380 

ttcttaggag caagaagaaa atgtggccct aaagggggtt agttgagggg tagggggtag 4440 

tgaggatctt gatttggatc tctttttatt taaatgtgaa tttcaacttt tgacaatcaa 4500 

agaaaagact tttgttgaaa tagctttact gtttctcaag tgttttggag aaaaaaatca 4560 

accctgcaat cactttttgg aattgtcttg atttttcggc agttcaagct atatcgaata 4620 

tagttctgtg tagagaatgt cactgtagtt ttgagtgtat acatgtgtgg gtgctgataa 4680 

ttgtgtattt tctttggggg tggaaaagga aaacaattca agctgagaaa agtattctca 4740 

aagatgcatt tttataaatt ttattaaaca attttgtt 4778 



(2) INFORMATION FOR SEQ ID NO. 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 882 amino acids 

(B) TYPE: amino acid 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO. 2: 

Met Gly Pro Tip Ser Arg Ser Leu Ser Ala Leu Leu Leu Leu Leu Gin Val Ser 

5 10 15 

Ser Tip Leu Cys Gin Glu Pro Glu Pro Cys His Pro Gly Phe Asp Ala Glu Ser 

20 25 30 35 

Tyr Thr Phe Thr Val Pro Arg Arg His Leu Glu Arg Gly Arg Val Leu Gly Arg 

40 45 50 

Val Asn Phe Glu Asp Cys Thr Gly Arg Gin Arg Thr Ala Tyr Phe Ser Leu Asp 
55 60 65 70 

Thr Arg Phe Lys Val Gly Thr Asp Gly Val lie Thr Val Lys Arg Pro Leu Arg 
75 80 85 90 

Phe His Asn Pro Gin He His Phe Leu Val Tyr Ala Trp Asp Ser Thr Tyr Arg 

95 100 " 105 

Lys Phe Ser Thr Lys Val Thr Leu Asn Thr Val Gly His His His Arg Pro Pro 

110 115 120 125 

Pro His Gin Ala Ser Val Ser Gly lie Gin Ala Glu Leu Leu Thr Phe Pro Asn 

130 135 140 

Ser Ser Pro Gly Leu Arg Arg Gin Lys Arg Asp Trp Val lie Pro Pro He Ser Cys 
145 150 155 160 
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Pro Glu Asn Glu Lys Gly Pro Phe Pro Lys Asn Leu Val Gin lie Lys Ser Asn 

165 170 175 180 

Lys Asp Lys Glu Gly Lys Val Phe Tyr Ser He Thr Gly Gin Gly Ala Asp Thr 

185 190 195 

Pro Pro Val Gly Val Phe He lie Glu Arg Glu Thr Gly Trp Leu Lys Val Thr 
200 205 210 215 

Glu Pro Leu Asp Arg Glu Arg He Ala Thr Tyr Thr Leu Phe Ser His Ala Val 
220 225 230 235 

Ser Ser Asn Gly Asn Ala Val Glu Asp Pro Met Glu He Leu He Thr Val Thr 

240 245 250 

Asp Gin Asn Asp Asn Lys Pro Glu Phe Thr Gin Glu Val Phe Lys Gly Ser Val 

255 260 265 270 

Met Glu Gly Ala Leu Pro Gly Thr Ser Val Met Glu Val Thr Ala Thr Asp Ala 

275 280 285 

Asp Asp Asp Val Asn Thr Tyr Asn Ala Ala He Ala Tyr Thr lie Leu Ser Gin 
290 295 300 305 

Asp Pro Glu Leu Pro Asp Lys Asn Met Phe Thr He Asn Arg Asn Thr Gly Val 
310 315 320 325 

He Ser Val Val Thr Thr Gly Leu Asp Arg Glu Ser Phe Pro Thr Tyr Thr Leu 

330 335 340 

Val Val Gin Ala Ala Asp Leu Gin Gly Glu Gly Leu Ser Thr Thr Ala Thr Ala 

345 350 355 360 

Val lie Thr Val Thr Asp Thr Asn Asp Asn Pro Pro He Phe Asn Pro Thr Thr 

365 370 375 

Tyr Lys Gly Gin Val Pro Glu Asn Glu Ala Asn Val Val He Thr Thr Leu Lys 
380 385 390 395 

Val Thr Asp Ala Asp Ala Pro Asn Thr Pro Ala Trp Glu Ala Val Tyr Thr He 

400 405 410 415 

Leu Asn Asp Asp Gly Gly Gin Phe Val Val Thr Thr Asn Pro Val Asn Asn Asp 

420 425 430 

Gly He Leu Lys Thr Ala Lys Gly Leu Asp Phe Glu Ala Lys Gin Gin Tyr lie 

435 440 445 450 

Leu His Val Ala Val Thr Asn Val Val Pro Phe Glu Val Ser Leu Thr Thr Ser 

455 460 465 

Thr Ala Thr Val Thr Val Asp Val Leu Asp Val Asn Glu Ala Pro He Phe Val 
470 475 480 485 

Pro Pro Glu Lys Arg Val Glu Val Ser Glu Asp Phe Gly Val Gly Gin Glu lie 

490 495 500 505 

Thr Ser Tyr Thr Ala Gin Glu Pro Asp Thr Phe Met Glu Gin Lys He Thr Tyr 

510 515 520 

Arg He Trp Arg Asp Thr Ala Asn Trp Leu Glu He Asn Pro Asp Thr Gly Ala He 

525 530 535 540 

Ser Thr Arg Ala Glu Leu Asp Arg Glu Asp Phe Glu His Val Lys Asn Ser Thr 
545 550 555 560 

Tyr Thr Ala Leu He He Ala Thr Asp Asn Gly Ser Pro Val Ala Thr Gly Thr Gly 

565 570 575 

Thr Leu Leu Leu He Leu Ser Asp Val Asn Asp Asn Ala Pro He Pro Glu Pro 
580 585 590 595 

Arg Thr He Phe Phe Cys Glu Arg Asn Pro Lys Pro Gin Val lie Asn He lie Asp 
600 605 610 615 
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Ala Asp Leu Pro Pro Asn Thr Ser Pro Phe Thr Ala Glu Leu Thr His Gly Ala 

620 625 630 

Ser Ala Asn Trp Thr lie Gin Tyr Asn Asp Pro Thr Gin Glu Ser He lie Leu Lys 
635 640 645 650 

Pro Lys Met Ala Leu Glu Val Gly Asp Tyr Lys He Asn Leu Lys Leu Met Asp 

655 660 665 670 

Asn Gin Asn Lys Asp Gin Val Thr Thr Leu Glu Val Ser Val Cys Asp Cys Glu 

675 680 685 

Gly Ala Ala Gly Val Cys Arg Lys Ala Gin Pro Val Glu Ala Gly Leu Gin He Pro 
690 695 700 705 

Ala lie Leu Gly He Leu Gly Gly He Leu Ala Leu Leu He Leu He Leu Leu Leu 

710 715 720 725 

Leu Leu Phe Leu Arg Arg Arg Ala Val Val Lys Glu Pro Leu Leu Pro Pro Glu 
730 735 740 745 

Asp Asp Thr Arg Asp Asn Val Tyr Tyr Tyr Asp Glu Glu Gly Gly Gly Glu Glu 

750 755 760 

Asp Gin Asp Phe Asp Leu Ser Gin Leu His Arg Gly Leu Asp Ala Arg 

765 770 775 

Pro Glu Val Thr Arg Asn Asp Val Ala Pro Thr Leu Met Ser Val Pro Arg Tyr 
780 785 790 795 

Leu Pro Arg Pro Ala Asn Pro Asp Glu He Gly Asn Phe He Asp Glu Asn Leu 
800 805 810 815 

Lys Ala Ala Asp Thr Asp Pro Thr Ala Pro Pro Tyr Asp Ser Leu Leu Val Phe 

820 825 830 

Asp Tyr Glu Gly Ser Gly Ser Glu Ala Ala Ser Leu Ser Ser Leu Asn Ser Ser 

835 840 845 850 

Glu Ser Asp Lys Asp Gin Asp Tyr Asp Tyr Leu Asn Glu Trp Gly Asn Arg Phe 

855 860 865 

Lys Lys Leu Ala Asp Met Tyr Gly Gly Gly Glu Asp Asp 
870 875 880 



(2) INFORMATION FOR SEQ ID NO. 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO. 3: 
taacaggaac acaggagtca tea 
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(2) INFORMATION FOR SEQ ID NO. 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO. 4: 
gtggtgggat tgaagatcgg 



(2) INFORMATION FOR SEQ ID NO. 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO. 5: 
ttcccccacc ccaggtctc 



(2) INFORMATION FOR SEQ ID NO. 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO. 6: 
ccctcacctc tgcccaggac 
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(2) INFORMATION FOR SEQ ID NO. 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO. 7: 
tgtagctctc ggcgtcaaag 
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CLAIMS 

1. A method of testing to detect whether a human subject is predisposed to 
cancer which comprises the step of detecting the presence or absence of an 
alteration in the gene encoding E-cadherin, wherein the presence of an 
alteration is indicative of a predisposition to cancer. 

2. A method for assessing a risk in a human subject for a predisposition for 
cancer which comprises the step of determining whether there is a germ- 
line alteration in the gene encoding E-cadherin, wherein the presence of an 
alteration is indicative of a risk for a predisposition for cancer. 

3. A method according to claim 1 or claim 2 wherein presence or absence of an 
alteration is determined by analysis of DNA coding for E-cadherin. 

4. A method according to claim 3 wherein the presence or absence of an 
alteration is determined by comparing the sequence of DNA from a sample 
from said subject with the DNA sequence coding for wild-type E-cadherin. 

5. A method according to claim 1 or claim 2 wherein the presence or absence 
of an alteration is determined by analysis of mRNA transcribed from DNA 
encoding E-cadherin. 

6. A method according to claim 5 wherein the presence or absence of an 
alteration is determined by comparing the sequence of mRNA from a sample 
from said subject with the mRNA sequence transcribed from DNA coding for 
wild-type E-cadherin. 

7. A method according to claim 1 or claim 2 in which the presence or absence 
of an alteration is determined by analysis of the amino acid sequence of the 
expressed E-cadherin protein. 

8. A method according to claim 7 wherein the presence or absence of an 
alteration is determined by comparing the amino acid sequence of the 
expressed E-cadherin protein from a sample from said subject with the 
amino acid sequence of wild-type E-cadherin protein. 
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9. A method according to claim 1 or claim 2 wherein the presence or absence 
of an alteration is determined by comparing the level of expression and/ or 
activity of E-cadherin protein present in a sample from said subject with the 
level of expression/ activity of wild-type E-cadherin protein. 

10. A method according to claim 1 or claim 2 in which the presence of one or 
more of the following alterations in the gene encoding E-cadherin is 
indicative of a predisposition to cancer: 

(i) G -> T substitution at nucleotide 1008 (exon 7); 

(ii) C insertion between nucleotides 2,382-2,386 (exon 15); 

(iii) C -» T substitution at nucleotide 2095 (exon 13); 
15 (iv) C insertion at nucleotide 1588 (exon 11); 

(v) G -> T substitution at nucleotide 70 (exon 2); and 

(vi) G A substitution at nucleotide 1 137 + 1 (donor splice site, intron 8). 

20 

11. A method according to any one of the preceding claims wherein the 
presence of an alteration is indicative of a predisposition, or a risk of 
predisposition, for gastric cancer. 

12. A method according to claim 11 wherein the gastric cancer is hereditary 
25 diffuse gastric cancer (HDGC). 

13. A method according to any one of claims 1 to 10 wherein the presence of an 
alteration is indicative of a predisposition, or a risk of predisposition, for 
colorectal cancer. 

14. A method according to any one of claims 1 to 10 wherein the presence of an 
30 alteration is indicative of a predisposition, or a risk of predisposition, for 

breast cancer. 



35 



15. 



A method according to claim 10 in which the presence of one or more of 
alterations (i) to (vi) in the gene encoding E-cadherin is indicative of a 
predisposition to hereditary diffuse gastric cancer (HDGC). 
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16. A method of prophylactic and/ or therapeutic treatment against cancer of an 
individual identified as having a risk of predisposition to cancer by a method 
according to any preceding claim which comprises the step of increasing, 
maintaining and/ or restoring the active concentration of wild-type E- 
cadherin protein within said individual. 

17. A method of prophylactic and/ or therapeutic treatment against hereditary 
diffuse gastric cancer (HDGC) of an individual identified as having a risk of 
predisposition to cancer by a method according to claim 12 or claim 15 
which comprises the step of increasing, maintaining and/ or restoring the 
active concentration of wild-type E-cadherin protein within said individual. 

18. A method of treatment according to claim 16 or claim 17 which comprises 
supplying said individual with wild-type E-cadherin gene function. 
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K-CAdL*iAxin wRNA coding Taranslatad S^tjuonca 



Sequence Range: 1 to 2645 

10 20 30 40 50 60 

ATGGGCCCTT GGAGCCGCAG CCTCTCGGCG CTGCTGCTCC TGCTGCAGOT CTCCTCTTGG 
MecGlyp^o TrpSarAxgSer X-ftuSerAla LeuLeuLeu LffuJUe^GlnVal SorSerTrp> 

70 60 90 100 110 120 

* * * * * * 
CTCTGCCAGG AGCCGGAGCC CTCCCACCCT GGCTTTGACG CCGAGAGCTA CACGTTCACG 
LeuCysGln GluProGluPro CysHisPro GlyPheAsp AlaGluSerTyr TlirPJieThr> 

1*0 140 150 160 170 180 

****** 
GTGCCCCGGC GCCACCTGQA GAGAGGCCGC GTCCTGGGCA GAGTGAATTT TGAAGATTGC 
ValProArg ArgHisLeuGlu AxgGlyArg ValLeuGly AreValAsnPhe GluAspCys> 

190 2D0 210 220 230 240 

* * * * * » 
ACCGGTCGAC AAAGGACAGC CTAtTTOTCC CTCQACACCC GATTCAAAGT GGGCACAGAT 
ThrGlyArg GliiArgThxAla TyrPheSer lUauAcpThr ArgPheLysVal GlyThrAsp^ 

250 260 270 280 290 300 

* ***** 

GGTGTGATTA CAGTCAAAAG GCCTCTACGG TTTCATAACC CACAGATCCA TTTCTTGGTC 
Glyvallle ThrValLysAr^ PxoLeuArg PfctfKisAsn ProGlnlleHis PkeLeuVal> 

310 320 330 340 350 350 

* * w * * * 

TACGCCTGGG ACTCCACCTA CAGAAAGTTT TCCACCAAAG TCACGCTGAA TACAGTGGGG 
TyrAla^rp AspSerThrTyr Ar^rLysPhe ScrThr-^ys ValTiurLeuAeii ThirValGly> 

370 38Q 390 400 410 420 

" * * *• * * 

CACCACCACC GCCCCCCGCC CCATCAGGCC TCCGTOTOTG GAATCCAAGC AGAATTGCTC 
HisHisHis AxsjProProPro HisGlnAla SerValSsr GlylleGlnAla GluLeuJ^u> 

43C 440 450 460 470 480 

* * * * + * 

ACATTTCCCA ACTCCTCTCC SKXSCCTCAGA AGACAGAAGA GAGACTGGGT TATTCCTCCC 
TiirPh^Pro AsnSerSerPro GlyLauArg ArgGlnLys Ar£fA=pTxpVal Il*ProPro> 

490 500 510 520 530 540 

*■ * * «r * *- 

ATCAGCTGCC CAGAAAATGA AAAAGGCCCA TTTCCTAAAA ACCTGGTTCA GATCAAATCC 
IleSerCys PrOGlUAcnGlu LysGlyPra PheProLys AsnLeuValGln IleLysSer> 

550 560 570 580 590 600 

****** 

AACAAAGACA aagaaggcaa ggttttctac agcatcactg gccaaggagc tgacacajccc 
AsriLysAsp LyeGluGlyLyfi ValPheTyr SGrlleThr GlyGlnGlyAla A£pTlirPro> 

610 620 630 640 650 660 

****** 
CCTGTTGGTG TCTTTATTAT TGAAAGAGAA ACAGGATGGC TGAAGGTGAC AGAGCCTCTG 
ProValGly ValPheXlelle GluArgGlu 0?nrGlyTrp LeuiyevalTnr GlaProLeu> 

670 680 690 700 710 720 

* * * r w * 

GATAGAGAAC GCATTGCCAC ATACACTCTC TTCTCTCACG CTGTOTCATC CAACGGGAAT 

AspArgGlu Ax^rlleAlaThx Ty=-ThrLeu Phesertiis AlaValsezrSer AenGiyAsn> 

730 740 750 760 770 780 

GCACTTGAGG ATCCAA'KiGA GATTTTGATC ACGGTAACCG ATCAGAATGA CaACAAGCCC 
AlaValGlu ASpPiroMetGlu IleLeuIle THrValThr AspQlnAanAsp AsnLysPrc» 
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790 800 810 820 830 840 



GAATTCACCC AGGAQGTCTT TAAGGGGTCT GTCATGGAAG GTGCTCTTCC AGGAA.CCTCT 
GluPheThr GlrtGluValPhe I<ysGlySer ValHetGlu GlyAlal-euPro GlyttirSer* 

850 860 870 HBO 890 900 

GTGATGGAGG TCACAGCCAC AGACGCGGAC GATGATGTGA ACACCTACAA TGCCGCCATC 
ValM^tGlu Vs.lThi'AlaThr AspAlaAsp AcpAspVal AsnThrTyrAsn AlaAlallo 

910 920 930 940 950 960 

* " + -r * * 
GCTTACACCA TCCTCAGCCA AGATCCTGAG CTCCCTGACA AAAATATGTT CACCATTAAC 
AlaTyrTiijr IleLeuSerGln AspProGlu LeuPrOAsp ijysAsnMerPlie ThrIleAan> 

970 980 990 1000 1010 1020 

* * * * „ * 

AGCAACACAG GAGTCATCAG TGTGGTCACC ACTGGGCTGG ACCGAGAGAG TTTCCCTACG 
ArgrAsnThr GlyVallleSer ValValTUr TnrGlyLeu AapAxgGluSer PhePzroThr> 

1030 1040 1050 1060 1070 1080 

* * * * * * 
TATACCCTGG TGGrTCAAGC TGCTGACCTT CAAGOTGAGG GGTTAAGCAC AACAGCAACA 
TyrThrLeu ValValGlnAla AlaAspLeu GlnGlyGlu GlyLeuSerThj: TixxAlaThr> 

1090 1100 1110 1120 1130 1140 

GCTGTGATCA CAGTCACTGA CACCAACGAT AATCCTCCGA TCTTCAATCC CACCACGTAC 
AlaVallle ThrValThrAsp ThxAsnAap AsnProPro IlePheAsnPir© ThrThrTyr> 

1150 1160 1170 1180 1190 120Q 

****** 
AAGGGTCAGG TGCCTGAGAA CGAGGCTAAC GTCGTAATCA CCACACTGAA AGTGACTGA'T 
LysGlyGln ValProGluAsn GluAlaA&n ValVallle ThrThrfceuLys valT&rAsp^ 

1210 1220 1230 1240 1250 1260 

****** 
GCTGATGCCC CCAATACCCC AGCGTGGGAG GCTGTAlACA CCATATTGAA TCATCATGGT 
AlaAspAla ProA&ttThrPra Al^TrpGlu AlaValTyr ThrlleLeuAsn AspAspGly> 

1270 1280 1290 1300 1310 1320 

* * * » « * 
GCACAATTTG TCGTCACCAC AAATCCAGTG AACAACGaTG GCATT««GAA AACAGCAAAG 
GlyclaiPhe ValValTbrTbr AsxiProVal AsxiAsnAsp GlylleLeuLys ThrAlaky©* 

1330 1340 1350 1360 1370 1380 

* * * * * 

GGCTTGGATT TTGAGGCCAA GCAGCAGTAC ATTCTACACG TAGCAGTGAC GAATGTGGTA 
GlyX,euAsp PheGluAlal-ys GlnGlnTyr IleLeuHis ValAlaValThr AsnValYal> 

1390 1400 1410 1420 1430 1440 

* * * * w * 

CCTTTTGAGG TCTCTCTCAC CACCTCCACA GCCACCGTCA CCGTGGATGT GCTGGATGTG 
PTOPheGlu ValSerl^uThr TlirSttrThx AlaThrVal ThrVeQAspVal I»euAspVal> 

14S0 1460 1470 1480 1490 1500 

AATGAAGCCC CCATCTTTGT GCCTCCTGAA AAGAGAGTGG AAGTGTCCGA GGACTTTGGC 
AsnGlnAla ProIlePheVal ProProGlu ^yaArgVal GluValSteGlu AspPUeGly> 

1S10 1520 1530 1540 1550 1560 

GTGGGCCAGG AAATCACATC CTACACTGCC CAGGAGCCAG ACACATTTAT GGAACAGAAA 
ValGlyGln GluIleThrSftr TyxTairAla GlnGluPro AspThrPheMet GluGl*H,ys> 
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1570 1S30 1590 1600 1610 1620 

* *- 

ATAACATATC GGATTTGGAG AGACACTGCC AACTGGCTGG AGXTTAATCC GG^CACTGGT 
IleTiixTyr ArglleTrpArg A^pThrAla A^nTrpkau GluIl^AsnPro AspThrGly> 

1630 1640 1650 1660 1670 1680 

* «* * V , * 

GCCATTTCCA ctcgggctga gctggacagg gaggattttg agcacqtgaa gaacagcacg 

AlaXleSer* TixrArgAlaGlu LeuAspAxg GluAcpPh.© GluHisValtyc Asxi£er-Th:f> 
169C 1700 1710 172Q 1730 1740 

w * * ■» w * 

TACACAGCCC TAATCATAGC TACAGACAAT GGTTCTCCAG TTGCTACTGC AACAGGGACA 
TyxThrAlft LauIlelleAla tfhrAspAsi). GlySerPrO ValAlaThrGly TJirGlyT*ur> 

1750 1760 1770 1780 1790 1800 

*• w «* * * * 

CTTCTGCTGA TCCTCTCTCA TGTGAATGAC AACGCCCCCA TACCAGAACC TCGAACTATA 
LeuLouLeu lleLeuScrAsp ValAenAsp MnAlaPro IleProOluPro Arg f ThxIle> 

1810 1820 1830 1840 1850 I860 

» * * * » * 

TTCTTCTGTG AGAGOAATCC AAACCCTCAG GTCATAAACA TCATTGATGC ASACCTTCCT 
PhisPHaCys GluArgrAanPro LysProGln VallleAsn IlelleAspAla AepI*euPro> 

1870 1B80 1890 1900 1910 1920 

******* 

CCCAA.TACAT CTCCCTTCAC AGCAGAACTA ACACACGGGG CGAGTGCCAA CTGGACCATT 
ProAsnThr SerProPhe1h.r AlaGluLeu ThXHisGly AlaSesrAlaA^n TrpTh_rIlc> 

1930 1940 1950 1960 1970 1980 

CAGTACAACG ACCCAACCCA AGAATCTATC ATTTTGAAGC CAAAGATGGC CTTAGAGGTG 
GlnTyrAsn AspProXhrOln GliaScjrlle IleX^iiLys ProLysMet^la LeuGl\iVal> 

1990 2000 2010 2020 2030 2040 

GGTGACTACA AAATCAATCT CAAGCTCATG GATAACCAGA ATAAAGACCA AGTGACCACC 
GlyAspTyr LysIleAsnDeu LysLeuMet AspAsnGln AsnLyaAspGln ValThx-Thr> 

2050 2060 2070 2080 2090 2100 

* + *■ k * * 

TTAGAGGTCA GCGTGTGTGA CTGTGAAGGG GCCGCCGGCG TCTGTAGGAA GGCACAGCCT 
LauGluVal SexValCysAsp CyeGluGly AlaAlaGly ValCysArgI<y3 AlaGlnPxo> 

2110 2120 2130 2140 2150 2160 

GTCGRAGCAG GATTGCAAAT TCCTGCCATT CTGGGGATTC TTGGAGGAAT TCOTGCTTTG 
ValGluAla GlyLeuGlnlle ProAlalle LeuGlyll© LeuGlyGlyXle LauAlaLeu> 

2170 2180 2190 2200 2210 2220 

CTAATTCTGA TTCTGCTGCT CTTGCTGTTT CTTCGGAGGA GAGCGGTGGT CAAAGAGCCC 
LguIIsLqu IleLeuLauLeu LeuIiGUPtas LeuArgArg ArgAlavalVal IiysGluPro> 

2230 2240 22S0 22S0 2270 2280 

TTACTGCCCC CAGAGGATGA CACCCGGGAC AACGTTTATT ACTATGATOA AGAAGGAGGC 
LeuLeuPiro ProGluAspAsp ThrArgAsp AsnValTyr IVrTyrAspGlu GluGlyGly> 

2290 2300 2310 2320 2330 2340 

GGAGAAGAGG ACCAGGAOTT TCACTTGAGC CAGCTGCACA GGGGCCTGGA CGCTCGGCCT 
GlyCluGlu AapGlriAgpPtLe AspLeuSer GlnLeuHie ArgG lyLetiAsp AlaAaroPrO 

2350 2360 2370 2380 2390 2400 
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E~Cadnerin mRNA coding Translated Sequence 

* * * « * * 
GAAGTGACTC GTAACGACGT TGCACCAACC CTCATGAGTG TCCCCCGGTA TCTTCCCCGC 
GluValTnx ArgAsnAspVal AlaProThr LeuMetSer VulPxro.SLrgTyr LeuProArcr> 

2410 2420 2430 2440 2450 2460 

CCTGCCAATC CCGATGAAAT TGGAAATTTT ATTGATGA&A ATCTGAAAGC GGCTGATAOT 
ProAlOAsn ProAspGlull* GlyAsnPhe IleAspGlu AsnLeuI>y£Ala Al&ASpThr* 

2470 2460 2490 2500 2510 2S20 

^ ■*■ * * *r » 

GACCCCACAG CCCCGCCTTA TGATTCTCTG CTCGTGTTTG ACTATGAAGG AAGCGGTTCC 
AspProThr AlaPrc?ProTyr AaPSerLieu LeuValPne AspTyrGluGly SerGlyS©r> 

2530 2540 2550 2560 2570 2580 

* * ▼ « * * 

GAAGCTGCTA GTCTGAGCTC CCTGAACTCC TCAGAGTCAG ACAAAGACCA GGACTATGAC 
GluAlaAla SerLeuSerSer LeuAsnSer SerGluSer AepLysAspGln AspTyrAsp> 

2590 2600 2510 2620 2630 2640 

TACTTGAACG AATGGGGCAA TCGCTTCAAG AAGCTGGCTG ACATGTACGG AGGCGGCGAG 
TyrLeuAsn GluTxpGlyAfin ArgPheLys Ly3L6UAa.a Asp&etTyrGly GlyGlyGlu> 

GACGACTAG 
ASpA6p*' r -> 



FIG 1 (cont'd) 



WO 99/20168 



PCT/NZ98/00160 



5/11 




FIG 2 



WO 99/20168 



PCT/NZ98/00160 



6/11 




in 



CO 

A 



O 

CO 



o 

T— 

CD 

O 
CD 
i 

o 
in 



o 



CO 

o 

CO 

T— 

CM 



o 

CN CO 

6 >. 



C0N(OW^C0Mr 

saseo jo jaqiunu 



FIG 3 
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FIG 4a 



Genomic DNA 
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FIG 6A 



FIG 6B 
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